Implementation of directed acyclic word graph

نویسنده

  • Miroslav Balík
چکیده

An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text T is a minimal automaton that accepts all substrings of a text T, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Direct Construction of Compact Directed Acyclic Word Graphs

The Directed Acyclic Word Graph (DAWG) is an e cient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the rst direct algorithm to construct it. It runs in time linear in the length of the string on a xed alphabet. Our implementation requires half the memory space used by D...

متن کامل

On Compact Directed Acyclic Word Graphs

The Directed Acyclic Word Graph (DAWG) is a space-e cient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the rst direct algorithm to construct it. It runs in time linear in the length of the string on a xed alphabet. Our implementation requires half the memory space used...

متن کامل

Ternary Directed Acyclic Word Graphs

Given a set S of strings, a DFA accepting S offers a very time-efficient solution to the pattern matching problem over S. The key is how to implement such a DFA in the trade-off between time and space, and especially the choice of how to implement the transitions of each state is critical. Bentley and Sedgewick proposed an effective tree structure called ternary trees. The idea of ternary trees...

متن کامل

On-Line Construction of Compact Directed Acyclic Word Graphs

A Compact Directed Acyclic Word Graph (CDAWG) is a space–efficient text indexing structure, that can be used in several different string algorithms, especially in the analysis of biological sequences. In this paper, we present a new on–line algorithm for its construction, as well as the construction of a CDAWG for a set of strings.

متن کامل

Sparse Directed Acyclic Word Graphs

The suffix tree of string w is a text indexing structure that represents all suffixes ofw. A sparse suffix tree ofw represents only a subset of suffixes of w. An application to sparse suffix trees is composite pattern discovery from biological sequences. In this paper, we introduce a new data structure named sparse directed acyclic word graphs (SDAWGs), which are a sparse text indexing version ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Kybernetika

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2002